Ballwin
Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding
Benhammou, Yassir, Kalyan, Suman, Kumar, Sujay
Broadcast and media organizations increasingly rely on artificial intelligence to automate the labor-intensive processes of content indexing, tagging, and metadata generation. However, existing AI systems typically operate on a single modality-such as video, audio, or text-limiting their understanding of complex, cross-modal relationships in broadcast material. In this work, we propose a Multimodal Autoencoder (MMAE) that learns unified representations across text, audio, and visual data, enabling end-to-end automation of metadata extraction and semantic clustering. The model is trained on the recently introduced LUMA dataset, a fully aligned benchmark of multimodal triplets representative of real-world media content. By minimizing joint reconstruction losses across modalities, the MMAE discovers modality-invariant semantic structures without relying on large paired or contrastive datasets. We demonstrate significant improvements in clustering and alignment metrics (Silhouette, ARI, NMI) compared to linear baselines, indicating that reconstruction-based multimodal embeddings can serve as a foundation for scalable metadata generation and cross-modal retrieval in broadcast archives. These results highlight the potential of reconstruction-driven multimodal learning to enhance automation, searchability, and content management efficiency in modern broadcast workflows.
- Information Technology > Information Management (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models
Zhang, Xinyu, Hofstätter, Sebastian, Lewis, Patrick, Tang, Raphael, Lin, Jimmy
Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art. However, current works in this direction all depend on the GPT models, making it a single point of failure in scientific reproducibility. Moreover, it raises the concern that the current research findings only hold for GPT models but not LLM in general. In this work, we lift this pre-condition and build for the first time effective listwise rerankers without any form of dependency on GPT. Our passage retrieval experiments show that our best list se reranker surpasses the listwise rerankers based on GPT-3.5 by 13% and achieves 97% effectiveness of the ones built on GPT-4. Our results also show that the existing training datasets, which were expressly constructed for pointwise ranking, are insufficient for building such listwise rerankers. Instead, high-quality listwise ranking data is required and crucial, calling for further work on building human-annotated listwise data resources.
- Asia > Mongolia (0.14)
- Europe > Norway (0.04)
- Europe > France > Île-de-France > Val-d'Oise > Roissy (0.04)
- (15 more...)
Machine learning catching on in insurance, but challenges remain - Business Insurance
Emerging tools such as artificial intelligence and natural language processing are being used in the insurance sector, but costs remain high and there are questions about bias being introduced into machine learning, according to a speaker at the Public Risk Management Association's annual meeting Monday. "Everything is smart these days," said Brian Billings, vice president of predictive analytics in Ballwin, Missouri, for Midwest Employers Casualty Co., part of W.R Berkeley Corp., and such devices as cell phones and televisions now collect data from their users. "All of that technology is being driven by the use of data." Machine learning, including artificial intelligence and natural language processing, takes the data being collected and tries to predict some kind of outcome, Mr. Billings said, such as a numerical value or, in the case of the insurance sector, a claims scenario. With natural language processing, a model is trained to read text, Mr. Billings said.